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Abstract 



In this paper, we consider a cognitive multi-hop relay secondary user (SU) system sharing the 

spectrum with some primary users (PU). The transmit power as well as the hop selection of the 

O^ ' cognitive relays can be dynamically adapted according to the local (and causal) knowledge of the 

instantaneous channel state information (CSI) in the multi-hop SU system. We shall determine a 

K^ ' low complexity, decentralized algorithm to maximize the average end-to-end throughput of the SU 

lO ■ system with dynamic spatial reuse. The problem is challenging due to the decentralized requirement 

tH- ■ as well as the causality constraint on the knowledge of CSI. Furthermore, the problem belongs to the 

f"^ ' class of stochastic Network Utility Maximization (NUM) problems which is quite challenging OTI . 

O , 

f^ ^ We exploit the time-scale difference between the PU activity and the CSI fluctuations and decompose 

the problem into a master problem and subproblems. We derive an asymptotically optimal low 

. ■ complexity solution using divide-and-conquer and illustrate that significant performance gain can 

\^ • be obtained through dynamic hop selection and power control. The worst case complexity and 



memory requirement of the proposed algorithm is 0{A'P) and 0{M^) respectively, where M is 
the number of SUs. 

L Introduction 

Cooperative Communication and Dynamic Spectrum Access (DSA) are two important 
technologies that drive the evolution of the next generation wireless systems. For instance, 
cooperative communication |[T1, |l2l exploits the broadcast nature of the wireless channel 
and enhances the reliability of the packet against channel fading and hence, increases the 
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coverage of wireless systems. There are a lot of works studying multi-hop relay network. In 
||3l . the authors analyzed the performance of a dual-hop relaying communications over fading 
channels. Performance bounds of multi-hop relay system is analyzed in IJH. However, these 
works did not consider dynamic resource adaptation in the relay system. In |[5|, the authors 
investigated the minimum energy per bit treating both capacity and power consumption as 
optimization parameters in the wireless ad-hoc network. The minimization of the transmit 
power under the assumption of orthogonal transmissions was studied in flU, 0, in which the 
optimal parallel-relay channel power allocation for Amplify and Forward (AF) and Decode 
and Forward (DF) were derived. However, in all these works, the power control solution 
adapts on the path loss only and failed to exploit the dynamic fluctuations of microscopic 
fading. In [[8]|, the authors considered dynamic power control for multi-hop relay but the 
solution is centralized and requires knowledge of the global channel state information about 
the entire adhoc network, which is very difficult to realize in practice. Furthermore, a fixed 
number of hops to deliver a packet to the destination is always assumed in the above works. 
Due to the store-and-forward penalty in the end-to-end throughput of multi-hop relaying, it is 
not always optimal to involve a fixed number of hops in the multi-hop network. To tackle this 
issue, various opportunistic multi-hop relaying protocols were proposed in dU, [[TOl . IfTTTl . 
In these designs, the number of hops to deliver a packet to the destination node changes 
dynamically according to the channel conditions. However, in these works, the opportunistic 
multi-hop protocols are heuristic in nature and the performance is studied by simulation and 
empirical measurements. In |fT2|| . performance analysis on one-hop relay protocol is studied. 
In iflBl . [[T4ll . performance analysis on some simple opportunistic multi-hop relaying protocols 
is studied. Furthermore, they all assume constant transmit power and deterministic channels 
where the effects of random fading is ignored. 

On the other hand, DSA is an important new paradigm of spectrum access in which a 
secondary system dynamically shares medium with a higher priority primary system. Using 
cognitive radios (CRs) IfTSll . ||T6]| . the nodes in the secondary user (SU) systems sense the 
activity of the primary users (PUs) and access the spectrum only if the primary system is 
idle. In other words, the SU system dynamically share the spectrum with the PU systems by 
exploiting the burstiness of the PU traffic in the temporal, frequency and spatial domains. 
One key issue of DSA or CR is the efficiency of spectrum sharing between the SU and 
PU systems. In IfTTll . [[TSll . the authors considered a CR system based on the interference 
avoidance approach in which the SU could transmit only if there are no active PUs within 
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the coverage of the SU system. While such approach exploits the burstiness of the PU activity 
without requiring the knowledge of PU signal structure, the access opportunity of the SU 
system will be quite low for SU separated by a large distance as such access opportunity 
exists only if all the PU along the SU coverage are idle simultaneously. As a result, cognitive 
multi-hop relay for the SU systems is a promising solution to resolve this issue of low 
probability of access for distant secondary users. While intuitively, cognitive multi-hop relay 
could significantly enhance the spectrum sharing efficiency between the SU and PU systems, 
there are still a number of technical challenges to overcome as listed below. 

• Jointly Optimal Opportunistic Hop Selection and Power Control for Cognitive 
Multi-hop relays: Most of the existing works only considered either the power control 
[HI, [m or the opportunistic multi-hop relaying protocols. It is very important to jointly 
optimize both the forward hopping strategy and the power control policy to exploit 
the instantaneous fluctuations of PU activities and the microscopic fading in order to 
improve the performance of the cognitive multi-hop relays. 

• Dynamic Spatial Reuse in Cognitive Multihop Relaying: In most of the existing 
works studying power control or forward hopping in multihop relay lIH, ifTOll . [fTTll . they 
focus entirely on the multihop aspects of the problem and assume that the multi-hop 
network does not have to share spectrum with any PU systems. This simplifies the 
problem significantly. While this is a reasonable assumption in the regular multihop 
network without PU, such symmetric spatial reuse is not always possible in cognitive 
multihop relay network due to the random PU activities on any hops. 

• Decentralized Solution with Local Knowledge of Channel State Information (CSI): 
An additional level of difficulty in solving the forward hopping and power control 
problem is the requirement of decentralized solution. In practice, it is very difficult to 
obtain and keep track of an up-to-date knowledge of the instantaneous channel state 
information for the entire multi-hop network. As a result, it is desirable to have a 
decentralized solution which requires knowledge of local (rather than global) channel 
state information only. In [|20ll . the authors considered a distributed resource management 
scheme for multi-hop CR networks but no power control is considered and the solution 
is based on heuristic design. 

• Causal Knowledge of Channel States in the Multi-hop Relay Network: In most of 
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the existing works [8J, not only global knowledge but also non-causaU knowledge of 
channel states in the multi-hop network is assumed. Specifically, at t = 0, the centralized 
controller is assumed to have knowledge of all the channel states in all the hops of the 
entire multi-hop relay network. However, by the time the packets are delivered in the 
n-th hop, the actual channel state may have changed and the constraint of having causal 
knowledge of channel states have not been taken into account in the previous works of 
power optimization in multihop relay network. 

In this paper, we shall try to address the above technical challenges. We consider a cognitive 
multi-hop SU system with a source, a destination and M half-duplex cognitive relays scattered 
between the source and the destination. The SU system dynamically shares the spectrum 
with a PU system (with many PU nodes). The transmit power of the SU nodes as well 
as the hopping sequence of the cognitive relays are adaptive according to the local (and 
causal) knowledge of channel states in the multi-hop SU system to optimize the average 
end-to-end throughput. The solution also accommodates dynamic spatial reuse across the 
cognitive multi-hop system. The problem belongs to the class of stochastic NUMj problems, 
which is well-known to be challenging. To obtain a decentralized solution for the throughput 
optimization problem we exploit the time- scale difference between the PU activity and the CSI 
fluctuations and decompose the problem into a master problem and subproblems (operating at 
different time scales). To deal with the causality requirement we express the subproblems 
into recursive forms and solve them using divide-and-conquer. We show that significant 
performance gains on the throughput of the SU system can be obtained using joint forward 
hopping and power control over a wide range of PU activity. Furthermore, we show that the 
decentralized solution has worst case complexity of 0{AP) and is asymptotically optimal 

'Causality here refers to whether the source knows about the future channel states along the entire multihop transmission 
event from the source to the destination. In existing works, one way to justify the "non-causal knowledge" is to assume the 
channel state remains quasi-static across the sum of frame durations in the multihop transmission from the source to the 
destination. 

^Stochastic NUM refers to a Network Utility Maximization problem where the objective function involves expectation 
w.r.t. the stochastic system state and the optimization variables involve not just actions at a given system state realization but 
rather a collection of actions for all system state realizations. This is a challenging problem because of the huge dimension 
of variables involved as well as the lack of explicit closed form expression for the objective function in terms of the control 
policy. 

'in our paper, we allow the CSI to be time varying across different hops in the multi-hop transmission and the control 
policy is adaptive to the current information (but not the future CSI knowledge) only. 
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for large M. 

II. System Model, Control Policy and End-to-End Throughput 

Fig{T] illustrates the system model of the cognitive multi-hop relay system. The SU system 
consists of a cognitive source, a destination and several randomly distributed relays. 

Assumption 2.1: The system adopts certain Layer 3 (network layer) protocol to determine 
a route from the SU source node to the SU destination node, where the route is defined as 
a sequence of ordered nodes R =< Rq, Ri, ...Rm >, where Rq, Rm are source node and 
destination node respectively. This route is assumed to be fixed throughout the communication 
session. 

Denote the source as Rq, destination Rm and M — 1 cognitive relays, {Ri, ..., Rm~i}, 
which are distributed between Rq and Rm. The PU system consists of short-range wireless 
systems where the PU nodes are assumed to distribute uniformly (with a density of pp) over 
the SU coverage area. Each of the PU node is assumed to have bursty activity with an active 
probability of P„. The PU and the SU systems share common frequency spectrum and the SU 
system can access the channel only when all the involved PU nodes are idle. In the following, 
we shall elaborate on the channel model, control policy and the end-to-end throughput of the 
SU cognitive multi-hop relay system. 

A. Channel Model 

Figure [21 illustrates the signaling flow in multi-hop relay system. For the sensing of PU 
activity, we adopt the distributed sensing and centralized data fusion model as in IEEE 802.22. 
For instance, there are periodic quiet periods in the SU system that enable the sensing of 
PU activity. During the quiet periods, the SUs sense the PU activity locally and sends the 
sensing results to the other SU nodes. The SUs exchange the sensing results and update 
continuous segment (to be defined in the next subsection) information for data fusion. Define 
Ajyi G {0, 1} as the sensing result which represents the availability of the shared spectrum to 
the SU system {A^ = 1 denotes that the shared spectrum is available to SU node Rm) and 
A = (y4i, ..., Am) be the vector of PU activity states for the M SU nodes. We assume an SU 
node Rm, fn = {0,1...,M} has access {Am = 1) to the shared spectrum if and only if the 
nearest active PU node is at least Dq meters away from the SU nodeQ. Furthermore, assume 



4 



Do is determined by tlie mean interference constraint to PU. For instance, denote Pint as the interference constraint 
from SU to PU, Po is the mean transmitting power of SU, then Do > ( -f^'- ) ° , where a is the path loss factor. 
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that A remains quasi-static between two consecutive sensing periods. This is a reasonable 
assumption as the burstiness of the PU nodes are of a longer time scale compared to the 
packet frame duration. 

The received signal at the j-th SU node from the z-th SU node at the k-th frame is given 
by: 

Yij{k) = H,,{k)^/D~jX,,{k) + Z,j (1) 

where Xij(k) is the transmitted data symbol from node i to node j, Zij is the zero-mean 
complex Gaussian channel noise (with normalized variance 1) and Gij{k) = \Hij{k)\'^Dij is 
the combined channel loss (including both the large-scale path loss Dy and the microscopic 
fading Hij) between node i and j. The microscopic fading Hij is modeled as zero-mean, 
unit-variance complex Gaussian i.i.d (independent for different users) random variables. Let 
G(A;) = {^^^(A;) : i ^ j, i,j G {0, 1, ..., M}} be the global channel state (GCS) information. 
We assume G is quasi-static within a frame. For practical considerations, we have the 
following restrictions on the knowledge of the channel states. 

. Local Knowledge of Channel States: We assume each of the SU node only has 
knowledge of the local channel state (LCS, to be defined below) and global PU activity 
state A (which remains quasi-static between two consecutive sensing periods). 

• Causal Knowledge of Channel States: We assume that each SU node only has causal 
knowledge of the LCS and cannot predict into the future. 

Specifically, we assume at the k-th frame, SU node m only has knowledge about the 
current LCS: G.m{k). Here, Gm(A;) = {G.mi{k),i G {m + 1, ...,i}} in which j should satisfy: 
Sm+i = . . . = Sj = 1, Sj+i = is the local CSI at the A;-th frame. 

B. System State, Hopping and Power Control Policy, System State Transition Kernel. 

In this section, we shall formally define the control policy in the cognitive multi-hop 
relaying system. The multi-hop relay network operates in a DF manner with half-duplex 
constraint. At each frame, the upstream SU node transmits a packet of B bits to its down- 
stream nodes using a transmit power which could be dynamically adjusted based on the 
current LCS knowledge. The down-stream SU node(s) attempt to decode the 5-bits packet 
before it can forward to the next hop. 

In this paper we consider dynamic spatial reuse in the cognitive multi-hop relay system 
as illustrated in Figure HI For any given PU states A, the multi-hop relay chain will be 
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partitioned into several segments, which is defined as: 

Definition 2.1 (Continuous Segment in route Mj; A continuous segment Lij in the cogni- 
tive multi-hop relay chain is defined as a sequence of nodes < _Rj, ..., i?j >C R such that: 

5,_i = 0,5, = ... = S, = l,S',+i = 0, ^,jG{l,2...,M} (2) 

(Define S^i = Sm+i = 0). The nodes Ri and Rj are called the head-node and the end- 
node of the continuous segment respectively. Define the probability of {Ri, ...,Rj} forms a 
continuous segment as Pr(i, j) = Pr(S'i_i = 0, Si = . . . = Sj = 1, Sj+i = 0). ■ 

Spatial reuse is allowed only for relays in different segments of the partition. Hence, relays 
in different segments can transmit different information simultaneously without interfering 
each other. Packets are stored at the end-node of each continuous segment and the end-node 
are not allowed to transmit except when the down-stream PU activity becomes idle. However, 
for relays in one continuous segment, they have to obey the TDMA constraint and cannot 
transmit different information simultaneously at any given time. 

Within a continuous segment L^j induced by the PU activity A, we shall define the hopping 
and power control policy as follows: 

Definition 2.2 (System State of Segment Lij): Suppose R ~ Rj induced by a continuous 
segment Lij under a PU activity state A. System state of Lij at frame indejq^ k G {1, 2...j — i} 
is given by: %(A;) = {sij{k), Gsi{k)}, where s.y(A;) G {i,i + 1, ■■■j} denotes the index of the 
source node at frame k, Sjj(l) = i; Gg^^k) is the LCS at node Sij{k). ■ 

Definition 2.3 (Control Policy Vlij in Segment Lij): A stationary policy fijj is a mapping 
from the current system state riij{k) to the corresponding hopping and power control actions. 
The policy Vlij = {Cij,Vij}, where: 

• Forward hopping policy Cij: hj{k) = Cij{riij{k)), k G {1, 2...J — i}, where the hopping 
control action (destination node index at frame k) has to satisfy the constraint: Sij{k) < 
hjik) < j, with the left inequality strictly holds when Sij(k) < j. 

• Dynamic power control policy Vij'. Pij{k) = Vij(j]ij(Jx)), k G {1, 2...J — i}, where the 
power control action (transmitting power at frame k) shall satisfy Pij{k) > 0. 



^The frame index k is equal to the number of hops already experienced by the packet currently transmitting in a continuous 
segment and will be reset to 1 when this packet is successfully delivered to the end node. Hence, k might be different from 
segment to segment. 
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Definition 2.4 (System State Transition Kernel): The source node of at the k + 1-th frame 
Sij{k + 1), is determined by the hopping control action in the previous frame lij{k). Further- 
more, the distribution of the channel state 0^.(^+1) is independent of the previous system 
states r]ij(k) due to the casual knowledge assumption. Hence, the state transition kernel of 
the system state {%(A;)} is given by: 

Pi{rji,ik + l)\r]i,ik),ni,) = 1 (sijik + 1) = hjik)) ■ Pr(G,,^(,+i)) (3) 



Remark 1: Strictly speaking, the forward hopping policy C does not contain all possible 
hopping sequences w.r.t. a given route M. For example, potential loops (e.g. Ri — )■ Rj — )■ i?j) 
are excluded. Note that it is an intractable problem to optimize w.r.t. general hopping policies 
(including loops) due to the enormous possible policies involved. Instead, we shall restrict 
to forward hopping policy only and from which, we could exploit the structure in the policy 
space to derive much simpler solutions. ■ 

C. End-to-End Throughput with Dynamic Spatial Reuse and Forward Hopping Control 

In order for a SU node to forward a packet, in any continuous segment, the node itself 
must be able to decode the packet first (DF). Suppose a node is able to decode if and only 
if the total mutual information received is no less than B bits. Hence, we have: 

Ti^{k) ■ log(l + a,,(fe)/,,(fc)(fc)/'ii(A:)) >B, A: G {1, 2...J - z}, s,^ < j (4) 

where z, j satisfy Q and Tij{k) is the transmitting time of the k-th frame in continuous 
segment Lij. We first formally define the per-hop reward and cost below. 

Definition 2.5 (Per Hop Reward and Per Hop Cost): Define the reward at the k-th frame 
as the time taken to transmit 1 bit at the A;-th frame: 



1 



when: Sij(k) < j 



I otherwise. 
Define the cost at the A;-th frame as the power consumed to transmit 1 bit at the k-th frame: 



Pijjk) 



when: Sij{k) < j 



otherwise. 
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Note that Tij{k) = B ■ T{r]ij{k),ilij) and hence the average data rate in the continuous 
segment L^j can be expressed as: 

f/y = E^^^ I ^^ -— ) = E^^^ I — — ^ I (7) 

where the expectation E^^^ is taken w.r.t. the probability measure induced by the control 
policy i^ij and the transition kernel in ([3]). Similarly, average power consumption Py in Lij 
can be expressed as: 

-p ^ j^Qij I \,k=lPyliJV^)'>^'h J 1 /gx 

VnriT(r7.,(A:),fi.,) ' 

The end-to-end average throughput of the cognitive multi-hop system can be written as 
the weighted sum of average data rate of all continuous segments with end- node Rm'- 

U{Q)= ^Pr(z,M)f/,M (9) 

The average sum-power constraint is given by: 

J2 Yl '^<hj)p.,<Po (10) 

Moreover, the conventional flow-balance constrain^ is given by: 

m-l M 

5^Pr(2,m)[/,„ > Y. Pr(m,i)t/^,Vm €{!,.. .,Af-l} (11) 

i=0 j=m+l 

III. Problem Formulation 

Note that the conventional flow-balance constraint in (fTTl) may not be conve^qj. To solve 
this issue, we introduce a new balance criteria, namely the section flow-balance criteria. 
For instance, we consider the sum of average data rate passing through each section (rather 
than each node). Specifically, the sum- average data rate passing through the m-th section 
(m G {1, 2, ...M} as illustrated in FiglS]). Define: Um = Y17=o Sj=m P^(^'i)^u- The section 
flow-balance criteria is given by: 

Um>Um+u^me{l,...,M-l} (12) 

In the following lemma, we shall illustrate that the section flow-balance criteria is in fact 
equivalent to the conventional per-node flow-balance: 

*The conventional flow balance constraint ensures that the output flow does not exceed the input flow at any SU node. 
'a convex (concave) function subtracting another convex (concave) function is neither convex nor concave in general. 
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Lemma 3.1: [Equivalence of the flow balance criteria] The conventional per-node flow 
balance constraint in (fTTl) is equivalent to the per-section flow balance criteria in (fT2l) . 

Proof: please refer to Appendix |A] for the proof. ■ 

Lemma 13.11 gives an equivalent form for traditional flow-balance criteria. Moreover, note 
that the objective U{{il)) in ^ is equal to: 

AI-l M-1 M 

U{Q) = 5^ Pr(z, M)t/,M =X;5ZP"(''-^')^*^- =^*^ 

i=0 j=0 j=M 

= min({f7i, U2, ..., Um}) (Due to section flow balance criteria (fT2l) ) (13) 

where fi is the overall control policy: f2 = {fiy, Vz,^ that satisfies ^ under a PU actively 
state A}. 

From (fT3]) . the optimization problem can be formulated as: 

Problem 1 (Original Problem): 

m-l M 

~ " (14) 



(15) 
where: Uij, Pij is given by ^ and ^ respectively. 

A. Decomposition of Main Problem 

The optimization problem in (fT4l) is too complex to solve directly. Furthermore, due 
to the causality constraint in the control policies V and £, the solution is not trivial and 
brute-force solution will not lead to viable solutions. However, it is worthy noting that for a 
given PU activity state A, operations on different continuous segment are naturally separated 
from each other, (e.g. as in Fig IH when S4 = 0, hopping and power control policy in 
segment Rq ~ R-^ has no direct influence on that in R^ ~ Rq). Making use of this insight, 
we shall first decompose the problem into a master problem and a sub-problem. Define: 

Vmain = {Pij}, i,j e{0,l,...,M}, KJ. 

We have the following decomposition theory: 

Lemma 3.2: Optimization problem consisting of a master problem ( Problem [21 with 
Vmain as the optimization policy) and — 2" subproblems (Problem [3l with Cij , Vij as 
the optimization policies) is equivalent to Problem [TJ 
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U = max 

n 


m-l M 

min > ^ '^ ^ Pr(i, j)Uii 

^ ' ' ' i=0 j=m 


Subject to: 


A/-1 M 
2=0 j=i+l 


't{i,j)P^j<Po 
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Problem 2 (Master Problem): 

m~l M 



U = max 

' main 



;Si^M}5ZEp^(^'^')^(^'. 



i=0 j=m 



(16) 



Subject to: 

M-l M 

Y, Yl P<hJ)P^J<P0 (17) 

i=0 j=i+l 

Problem 3 (Subproblem): 

U*APii) = max E^^^ I — —. ^ | (18) 

Subject to: 

Proof: Please refer to Appendix |B] for the proof. ■ 

IV. Decentralized Hop Selection and Power Control Algorithm 
A. Solving the Sub Problem 

To satisfy the causality constraint of the control policy on the local CSI, we have to model 
the subproblem in a recursive form so as to apply dynamic programming (DP) Il22ll . However, 
problem (fTSi) cannot be expressed in a recursive form and hence, could not be divide-and- 
conquered. To tackle the challenges, we shall solve a lower bound version of the problem. 
We shall show that the lower bound solution is indeed asymptotically tight for large number 
of nodes. 

1) Asymptotically Optimal Solution: We first elaborate a suboptimal solution for the 
subproblem (Problem [3]). Let 



nl'^ = arg min E^'' 



Y T{vij{k), %) + Xij {P{r]ij{k),n,j) - P,jT{rji,{k), %)) (20) 

n 

where the parameter Xij in the suboptimal solution Qf^^ is given by the roots of the equation^: 






, Z^fcJi T{r]ij{k), ilij^ 



^a, [ Ek=lP{V^Ak),n^iJ I ^ p^^ ^21) 



Note that the solution ilf^^ is a feasible but suboptimal solution of the subproblem (Prob- 
lem [3]). We have the following lemma about the property of the suboptimal solution fifj^. 

^For any given Xij, Qfj^ is determined by OOb . Substitute both policy to the J21b . the LHS become a function of Xij 
July 29, 2010 DRAFT 
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Lemma 4.1 (Asymptotic Optimality ofVtf^^): If the following conditions are satisfied: 1) 
For any e > 0, there exists a finite C > such that when |s — t| > C, Gst < e; and 2) 
Gst > Gst', Gst > Gs't when t' > t > s > s'; then we have: Ul-^(Pij) -^ U*j(Pij), as 
\j — z| — 7- oo. where U^j^{Pij) is the average throughput of the segment Ri ~ Rj using the 
suboptimal control ^ff- ■ 

Proof: Please refer to Appendix O for the proof. ■ 

Remark 2 (Physical Interpretations of Conditions (1) and (2) in Lemma \4.1\) : The condi- 
tion 1) in Lemma |4~T] means that the nodes are not "over concentrated" on one spot. This 
is is a mild requirement, which only excludes the special topologies where there are infinite 
number of nodes over a finite coverage area. The condition 2) refers to the path loss dominated 
situations, which applies for medium-range (over 2-5 km) multi-hop networks. ■ 

As a result, the suboptimal solution fi^^ has reasonable performance in general cases (as 
will be illustrated in Section V) and it is asymptotically optimal for large number of nodes. 
In order to derive ^ff, we shall first express into a recursive form and solve the problem 
by divide-and-conquer using DP. Define 

«(,.,,.);P,(.,.M*))^i,Jj^i«^ (22) 

then the problem (l20l) can be expressed recursively as: 

J(s,,(fc)) = i?G.^ , J min(^(r7,,(/c); P,,(A;), /,,(/.)) + J(/,,(A;)))] (23) 

'^' Pij{k),lij{k) 

where J(^m) is called the expected cost from node Rm to Rj. Note that J{j) = and 
J{sij{l)) = J{i) gives the value of (|20|) . As a result of the recursive form in (l23l) . the 
backward recursion algorithm to solve problem (l20l) is summarized in the following. 
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Algorithm 1 (Offline and Online Solution of the Sub Problem): 
• Offline Recursion: 

- Step 1: Initialize Xij = 0. 

- Step 2: For s = j — l,j — 2, ..., i, determine J(s) by (Here we assume node Rs 
has the knowledge of the distribution of the local channel state Gg)'- 



Jis) = Ec^ik) mill 

mG{s+l,...j\ 



i + A.,(p;(A.,)-p,,o 



J{m) 



(24) 



where P*{\ij) is the solution to (l25l) defined below. The values of J{s) is stored. 

- Step 3: Substitute solution obtained from Step 2 into (|2TI) . If the LHS is larger 
(smaller) than Pjj by e, increase (decrease) Xij by a step 5 and go to Step 2. 
Otherwise, stop. 

Online Policy: 

- Step 1: Set /c = 1 and Sk = i. 

- Step 2: Obtain the local CSI Gsij{k) and the optimizing hop selection and power 
control actions are given by Pg-tk^^Kj) and: 

1 + \ij{Pl^{\ij) - Pij) 



II = arg min 

s€{lk + l,...,j} 



Jis) 



\og{l + P,,{ky{X,,)GiUk)) 
- Step 3: Set k := k + 1, Sk+i = 11- If Sfc+i 7^ j, goto Step 2. Otherwise, stop. 



Gs,,(k)i,j(k){k) 

(1 + P,,{k)Gs^^(k)u^^k){k)) log(l + P,,{k)Gs^^(k)k^(k){k)) + (P,, - P,,{k))Gs,^^k)u^(k){k) ~ '' 

(25) 

Remark 3: Note that the memory size of the table in the offline recursion is j — i. The 

computational complexity for the online algorithm in each step k is only of the order j — i. 

Hence, the online algorithm has worst case complexity O(M^) and worst case memory 

requirement 0{M) for each continuous segment i,j. ■ 

B. Solving the Main Problem 

After solving for the subproblem, we shall focus on solving the main problem based on 
U^^{Pij) (which is of a longer time scale) in this section. We first establish the following 
Theorem regarding the concavity of U,[^^{Pij) w.r.t. P^j. 

Lemma 4.2 (Concavity of the Lower Bounds ofU*j{Pij)): The lower bound (U,lj^{Pij)) 
of U*j{Pij) is a concave function of Pij. 

Proof: Please refer to Appendix |E] for the proof. ■ 
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From Lemma 14.21 it is easy to deduce that the lower-bound version of the master problem 
in (fT6l) [with U*APij) replaced by U^^{Pij)] is a convex optimization problem. As a result, 
the standard gradient search could be applied to solve the master problem. Please refer to 
Figure \T0\ for the detailed algorithm description. 

Remark 4: Note that the offline recursion needs to be updated only when there are changes 
in the PU statistics or the SU path loss and in practice, the above offline algorithm is computed 
over a long time scale. Combining the master problem and the subproblems the total memory 
requirement of the offline table in algorithm 1 is 0{AP). ■ 

V. Simulation Results 

In this section, we shall illustrate the performance of the proposed scheme by simulation. 
We consider a multi-hop cognitive relay system with 6 nodes ({Rq,Ri, ...R^}) and 6 PUs 
(one PU in the neighborhood of each SU node). The distance between Rq and R^ is 5, and 
the other 4 nodes randomly scatter between them. Path loss between two nodes Ri,Rj is 
given by the "flat-earth model" Il24]| : log^o Aj = — alogio d-ij (dB) where d^ is the distance 
between the two nodes and a is the path loss exponent. The proposed scheme is compared 
with four schemes below: 

• Direct transmission only (Baseline 1): Rq transmit directly to R^ when all PU remain 
silent {Si = 1, Vi{0, 1, ...5}). This is equivalent to the case without relay. 

• Per-node transmission only (Baseline 2): if Rm (Vm{0, 1, ...4}) received a packet in 
previous frames, it transmits this packet to Rm+i when the PU activity permits (Am = 
Sm+i = !)• This corresponds to the traditional DF multi-hop relay scheme. 

• Direct (per-node) transmission with dynamic spatial reuse (Baseline 3/4): These 
two schemes adopt the same dynamic spatial reuse method as the proposed scheme. Yet, 
within each continuous segment, they adopt direct and per-node transmission respectively. 

Figure [6] and Figure |7] illustrate the average end-to-end throughput (U) versus the average 
SNR (Pq) and PU activity level (PT{Ajn = 0)) respectively. The proposed scheme achieves 
significant throughput gains over a wide range of SNR and PU activities. This gain is 
contributed by both the dynamic hop selection as well as dynamic power control. Comparison 
with baseline 1 illustrates how cognitive relay could help to increase the probability of access 
and efficiency of spectrum sharing in general. Comparison with baseline 2 and 3 illustrates 
the importance of joint dynamic power and opportunistic hop selection in cognitive multihop 
systems. The gain contributed by the dynamic hop selection is most significant under moderate 
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SNR. At very high SNR, the dynamic hopping performance approaches that of the baseline 
3, illustrating the system always perform one-hop direct transmission to avoid the half-duplex 
penalty. At very low SNR, the performance of the proposed scheme approaches that of the 
baseline 4, illustrating that the system prefer hop-by-hop transmission for SNR gain. 

Figure |7] illustrate that the dynamic hopping gain is more prominent under low PU activity. 
This is because at low PU activity, there is a higher chance of forming a longer continuous 
segment and hence, more flexible choices for the dynamic hop selection. Figure[8]illustrate the 
convergence rate of the off-line recursion for the Main problem (Algorithm 2). The proposed 
algorithm can achieve 90% of the converged performance within 10 iterations and converges 
after about 30 iterations. This iteration efficiency is good enough for off-line algorithms. 

Figure |9] illustrates the normalized throughput yt— versus the average transmit SNR 
(Po) for various number of cognitive relay nodes where jr^ is obtained from brute-force 

^ max 

numerical optimization of Problem 1. With A^ = 6, we have over 95% of the optimal 
performance. This illustrates that the proposed scheme is not only order-optimal but achieves 
close-to-optimal performance even in small to moderate number of cognitive relay nodes. 

VI. Summary 

In this paper, we have derived a low complexity hop selection and dynamic power control 
policies to maximize the average end-to-end throughput of the cognitive multi-hop SU system 
with dynamic spatial reuse. By exploiting the time-scale difference between the PU activity 
and the CSI dynamics, we decompose the problem into a Master problem and several Sub 
Problems. The solution obtained is decentralized in the sense that each node determines its 
next hop and transmit power based on the local and causal CSI only. The solution consists of 
an offline recursion and an online algorithm with worst case complexity C(Af^) and worst 
case memory requirement 0{M^). Furthermore, the solution is asymptotically optimal for 
large number of nodes. Significant throughput performance has been demonstrated. 

Appendix A 
Proof of Lemma [3771 

m-l M m M 

i=0 j=m 1=0 j=m+l 

m-l M 

= ^Pr(2,m)[/,^- Y. ^<rn,j)Ura,j Vm G {1, ..., M - 1} 

i=0 j=m+l 
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Hence: 

m-l M 

Um.>U^+i ^ ^Pr(z,m)[/i,^- ^ Pr(m,j)?7„j > 

i=0 j=m+l 

m-l M 

^ Y,P<hm)Ui,rr^> Yl Pr(m,j)f/mj (26) 

i=0 j=m+l 

Appendix B 
Proof of Lemma [3T2] 

To prove Problem [2] and Problem [3] are equivalent to Problem [TJ we first prove the following 
Lemma: 

Lemma B.l: Define: 

V = max min iy^AmiMxi)] (27) 

X m,G{l,2...M} \-^ I 

V = min maxiy^ Amifiixi)] (28) 

m£{l,2...M} X \ ^ / 

where X = {xi E Ci,i E {1,2...L}} are a set of independent variables. If fi{xi) is finite 
and Vm G {1, 2...M}, i E {1, 2...L}, then: 

V = V'= min (VA™/;| (29) 

where /; = max^^gc. fi{xi). 

Proof: In general, switching of "max" and "min" is not allowed but there are two 
specific structures in Lemma IbTTI that we are exploiting. 

• Independency Property: /j(xj),Vz are mutually independent (i.e. they are not coupled 
by any common variables), as X = {xi, i E {I, 2...L}} is a set of independent variables. 

• Monotone Property: Since for every m and i. Ami > 0: Vm,i, X]i=i ^™/i(a;i) is an 
non-decreasing function of fi{xi). As a result, V is an non-decreasing function of fi{xi), 
VzG {1,2...L}. 

Since V is an non-decreasing function of /i(xj) (Monotone Property), fi{xi) < f*, \fi E 
{1,2.. .A/}: 

V< min J VA„,/;) (30) 
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Moreover, denote x* = argmax^.gCi fti^i), from the Independency Property, {xi = x*,i E 
{1,2...M}} is a feasible point for V. Hence: 

V> min [f^Armf*] (31) 

m.e{l,2...M} y-^ I 

Combining m, ^ V = min^e|i,2...Af} [Zti ^mif*). 

On the other hand, since Vm, maxx fejli ^mifiixi)) = Y^Li ^mifi- 

V'= min (V^™/;) (32) 

■ 

In Problem [H for a fixed Vmain, denote: 

(rn-l M 
y^^y^ PT{i,j)Uij 

(M-\ M 
> > A(i, j,m)Uii I (33) 

^^ ^ / 

1=0 j=l / 

{Pr(i, j) if: i < m < j; 
else 

Note that: a) From (|7]), ([8]), L^^ and Pjj depends on different set of variables Cij and 
Vij. Hence, for a given Vmain = {Pij}, constraint (fTSi) is decoupled and {Cij,Vij} become 
independent variables for different {i,j}. 

b) From (fT4l) . for all i,j, Pr(z, j) > 0, A{i,j,m) > 0, Mi^j^m. 



Combining a) and b), we can apply Lemma IB.ll and obtain: 

(m-l M \ 

5^5^Pr(z,j)t/*.(P.,) (34) 

where U*j{Pij) is given by the solution of Problem [3l Hence, we can rewrite the objective 
function as: 

(m-l M \ 

y^y^PT{i,j)U*APij)] (35) 

1=0 j=m / 

which is exactly the objective function in Problem [21 Therefore, the optimal solution given 
by Problem |2] and Problem [3] shall be the same as Problem [TJ 
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Appendix C 
Proof of Lemma [4~T] 

We shall prove that the suboptimal solution i7fl^ is asymptotically optimal under the two 



co nditions in Lemma 14. 1[ We shall first prove the following Lemma: 



Lemma C.l: Suppose: 1) For any e > 0, there exists a finite C > such that when 

\s-t\> C, Gst < e; 2) Gst > Gst', G^t > Gs't when t' >t> s>s'. Then: 

EaTfa,(fc),a,) ^ ^ ^^^ 

z^fc=i v/»jV ;: tjJ ^ ^ -j^ probability when ? - z ^ oo (37) 



exists a finite C > such that when |s — t| > C, Gst < e, let e ^ =-, we have 



Proof: We partition the continuous segment i?j ~ Rj into i? = \^] clusters: Vr = 

{z + rC, i + rC + 1, ... min(z + r(C+ 1) - l,j)}, r G {0,1...R~ 1}. As for any e > 0, there 

/y(A;) -Sij(A;) < C,VA; G {l,2...j -?} (38) 

Denote T, = Ez,,.(fc)GV,. ^(%-(^),%) = Es,, (fc)^i,/,, (fc)GV. iog{i+G,^^, j,),^^ („ (fc)p,, (k)) ' *en: 
X^-^~\^('7ij(^)5%) = Z]-r=o ^^- Moreover, from (l38]), we have: 1 < \sij{k) ^ j,lij{k) G 
V,. I < C. Moreover, as in practice, the time duration to transmit one bit should be positive and 
finite, there should exist Tmin,Tmax G IR^ such that Tmin < T{r]ij{k),Vtij) < T^ax, '^Vijik). 
Hence we have: 

Train < T^ < CTa,,^, W G {0, 1, ...R - 1} (39) 

As we shall proof in Appendix we have the following results concerning the covariance 
be tween {T,.}: 



Lemma C.l: Given: 1) For any e > 0, there exists a finite C > such that when \s—t\ > C, 

Gst < e 2) Gst > Gst', Gst > Gs't when t'>t>s>s'.Wo have: Cov(T„ J2Zl Ts) < 0, 
We {l,2...R-l}. ■ 

With Lemma IC. 21 and (1391). we have: 

j:f=iTr ^ ^ Et'o^ Var(r,) + 2 Et'/ Co v(T,, Elli Ts) ^ Et'^ Var(r, 






inin 



- ^T^ ^ as: i? = [^1 ^ oo (40) 



'otherwise, T{r,ij{k),n,j) = wi+g. ,J ,...(k)p.,(k)) ^ ^ 



log(l + G, .^ (fc)i,j. (fc) {fcj^j (fc)) ^ eP,j (fe) 
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Substitute (1401) into Chebyshev inequality, (|36l) is proved; (|37|) can also be proved through 
similar process. We shall omit the details due to page limit. ■ 

From dill), (l37l): 



j^Qij 



j^Uij 






— ;■ 



1 



EiriT(%(fc),fi 



ijy 



-> 






and 



Zil\E^.T{v,,{k),n, 



in probability when j — z — )■ oo 



Hence, for sufficiently large j — i. Problem I3l can be equivalently rewritten as: 



3-^ 



m\nJ2 E''^'T{r]ij{k),nij) 



k=l 



J-« 



S.t: Yl ^''" (^(^^^■(^)' %) - ^ij-^(%(^)> %)) < 



(41) 



(42) 



fc=i 



Observe that the Lagrangian dual function of the above problem is exactly (l20l) . Hence, 
U^^(Pij) -^ U*APij) for sufficiently large j -i. 



Appendix D 
Proof of Lemma lC2l 

We shall first prove the following Lemma: 



Lemma D.l: Given three sequences Oq < ai < ... < ajsf, bo > hi > ... > b^, pn > 

0,n e {0,1...N} which satisfy: Y.n=QPn = 1. Y.n=oPnO'n = Y.n=iPnbn = 0, we have: 

J2n=oPn(lnbn<0. U 



Proof: Denote N^ = \{n : an < 0}|, A^^+ = \{n : bn > 0}|, where |A| means the 
cardinality or set A. If N^ = N^, then obviously J2n=o'^nbn < 0; Otherwise, without loss 
of generality, assume A^^ > N^^ and then: 



N--1 



N 



Ni^-l 



N--1 



"^Pnanbn = ^ PnanK + ^ PnanK + ^ PnanK < ^ PnanK + ^ PnAn^n 



n=l 



n=0 



n=N:' 



n;^-i 



n=Na 

Na-1 



n=0 



n=N+ 



N^-1 



< °(Af+-l) X] Priori + a(^N+) X] Pn^n < («(Ar+-l) " «(Af+)) ^ PriK < 0(43) 



n=0 



n=N+ 



n=0 



Recall the system state transition kernel: 

Fr{r],,{k)\r],,{k-l),n,j) = 1 (s,,(A;) = /,,(A: - 1))) Pr(G,,^.(fc)) (44) 
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It can be observed that conditioned on the source node at the frame k: Sij{k), {r]ij{k), r]ij{k+ 
1), ...} are independent of {riij{l),riij{2), ...riij{k — 1)}. Correspondingly, {T{riij{s), fi*-'), s G 
{k, k + 1...}} are conditionally independent of {T(r]ij{s), fi*-'), s E {1, 2..., k — 1}}. Denote 
Ir _ram = min(/jj(/c) : lij{k) E Yr). Then: conditional on lr_ram, Tr is independent of 
{T,,s E {0,l,...r-l}}. Hence: 



r-l 



^n,, iTr'Y^ ^' 



s=l 



r-l 



'"r mill -^ I V^Jj 



'r_min •^1 -C/ \J- r\''r_mm XjH/ I ^ 1 g 

J \s=l 

where x E {i + rC, i + rC + 1, ..i + rC + \Yr\ — 1}. Denote A;,r_mm = niin(fc : lij{k) E V,.). 
Since Gst > Ggf when s < t < t', T(r]ij(kr_mm) — 1) is an non-decreasing function 
of lr_mm- Correspondingly, E^^^ {J2l=i'^s\^r_mm) is a non-decreasing function of /r_min- 
Similarly, as G^t > G^'t when s' < s < t, i?^'j(Tr |/r_min) is a non-increasing function 

of lr_min. Lct ^^- (EI^l^^sl ^r_min = x) - E^- (EI=1 ^s) = «- ^^- (Tr |/r_min = O;) - 

E^'^{Tr) = bn, Pr(/r_min = x) = Pn and substitutc to Lemma IdH ZlLtTi'+tc ^^(^^-mm = 
x) (i?^-(T,,|/,_^i, = x) - E^-(T,)) (i?^- (E::; T,\ k_^^ = x)- i?^- (E::; T,)) < O. From 
this result and (1451) : 

r-l / r-l \ /r-l 

Cov(T^, 5]] ^«) = E^^' [^r^TA- E^^^ {Tr) E^^^ I ^ T, 

s=0 \ s=l ) \s=l 

i+rc+|V^|-l 

'r-l 




X - E^^^ 5Z ^s < (46) 



,s=l 



Appendix E 
Proof of Lemma [43] 

Due to the Theorem of Lagrangian ( Il23]| . section 5.2.3), we have 



where \*APij) is the Lagrange multiplier obtained in the subproblem via Algorithm [T] Hence, 
Lemma I4~2l holds if and only if X*APij) is a non-increasing function of P.^. Note that in 



(l25l) . ^k,lk,Gs,j{k)Uj{k){k) > 0: -Pjj(/i;) decreases as A*, increases. Substitute this result to 
(|2TI) and it is obvious that \*APij) decreases as Pjj increases. 
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Fig. 1. System Architecture of the Cognitive Multi-hop Relay Network. PU and SU denote the Primary User and the 
Secondary User, respectively. The source node in the SU system delivers packet to the destination node via the help of the 
linear multi-hop relays. Each node has a cognitive radio to detect and sense the local PU activity. The nodes are numbered 
according to the transmission route determined by certain Layer 3 protocol. 
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Fig. 2. Signaling flow of the Cognitive Multi-hop Relay Network. PU activity is obtained in the periodic sensing frame. 
The transmitting node obtains instantaneous local channel state from the reverse link. Although each hop may have a 
different frame duration, such design can be accommodated over a synchronous relay network. For example, similar to 
IEEE 802. 16j, each relay node in the system is synchronized to the symbol boundary. As a result, the time varying frame 
duration (quantized to the integral number of symbols) can be realized on top of the symbol-synchronized relay network. 






Traditional Pipeline Channel Reuse 




Fig. 3. Illustration of the traditional "regular pipeline spatial reuse" relay protocol in a multi-hop network. 
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Dynamk Channel Reuse 

Fig. 4. Illustration of dynamic spatial reuse when the multi-hop relay chain is partitioned into two continuous segments 
by some PU activity realization. We adopt dynamic hop selection within each continuous segment. 
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Fig. 5. Illustration of the Section Flow Balance criteria. 
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Performance under different transmit SNR (Patli loss expoent=2) 
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Fig. 6. Average end-to-end througliput versus transmit SNR Pq. The PU activity is given by PT{Am = 0) = 0.15 and 
ttie path loss exponent is given by 2. 



Performances under different PU activity level 
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Fig. 7. Average end-to-end throughput versus PU activity Pr(Am ~ 0). The transmit SNR is 30dB with path loss exponent 
given by 3. 
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Convergence speed of Main Problem Recursion 
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Fig. 8. Average end-to-end throughput versus Number of iterations in Algorithm 2. The PU activity is given by Pr{A„ 
0) — 0.15 and the path loss exponent is given by 3. 
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Fig. 9. End-to-End normalized throughput of the proposed scheme (normalized by the strictly optimal performance 
obtained from brute force numerical optimization) versus transmit SNR Pq for N=3,4,6 cognitive relay nodes. The PU 
activity is given by Pr(ylm = 0) = 0.15 and the path loss exponent is given by 3. 
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Fig. 10. Algorithm description for the Main Problem 
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